SemanticScuttle - klotz.me » klotz: data science

Causal Machine Learning: What Can We Accomplish with a Single Theorem?

Exploring and exploiting the seemingly innocent theorem behind Double Machine Learning. The theorem, rooted in econometrics, states that if we have a linear model that predicts an outcome variable based on multiple features, and we want to understand the causal effect of a specific feature on the outcome, we can use the residuals of the model as an instrumental variable to estimate the causal effect.

2024-08-09 Tags: causal inference, data science, machine learning, double machine learning by klotz

Why Clustering Fails

Discusses reasons why clustering in data science might not produce desired results and how to address these issues.

2024-07-06 Tags: clustering, data science, unsupervised, machine learning, hdbscan by klotz

LLM Apps, Crucial Data Skills, Multi-Agent AI Systems, and Other July Must-Reads

This article features a curated list of the top data science articles published in July, covering topics such as LLM apps, chatGPT, data visualization, multi-agent AI systems, and essential data science skills for 2024.

2024-07-04 Tags: data science, llm, data visualization, agents, lists, towardsdatascience by klotz

Prompt Engineering: Tips, Approaches, and Future Directions

An article discussing the current state, recent approaches, and future directions of prompt engineering in data and machine learning. It includes several links to relevant articles and tutorials on the topic.

2024-06-27 Tags: prompt engineering, data science, llm, tutorial by klotz

Data Visualization Generation Using Large Language and Image Generation Models with LIDA

An overview of the LIDA library, including how to get started, examples, and considerations going forward, with a focus on large language models (LLMs) and image generation models (IGMs) in data visualization and business intelligence.

2024-06-26 Tags: lida, llm, visualization, data science by klotz

It’s Time to Finally Memorize Those Dang Classification Metrics!

This article discusses the importance of understanding and memorizing classification metrics in machine learning. The author shares their own experience and strategies for memorizing metrics such as accuracy, precision, recall, F1 score, and ROC AUC.

2024-06-24 Tags: classification, metrics, machine learning, data science, precision, recall, accuracy, roc, auc by klotz

Principal Component Analysis Made Easy: A Step-by-Step Tutorial

This article explains the PCA algorithm and its implementation in Python. It covers key concepts such as Dimensionality Reduction, eigenvectors, and eigenvalues. The tutorial aims to provide a solid understanding of the algorithm's inner workings and its application for dealing with high-dimensional data and the curse of dimensionality.

2024-06-21 Tags: principal component analysis, pca, dimensionality reduction, eigenvectors, eigenvalues, machine learning, data science, statistics by klotz

How to Fine-Tune BERT for Sentiment Analysis with Hugging Face Transformers

This tutorial covers fine-tuning BERT for sentiment analysis using Hugging Face Transformers. Learn to prepare data, set up environment, train and evaluate the model, and make predictions.

2024-06-06 Tags: bert, sentiment analysis, hugging face, transformers, natural language processing, machine learning, pytorch, data science by klotz

Understanding Friedman’s H-statistic (H-stat) for Interactions

This article explains the concept and use of Friedman's H-statistic for finding interactions in machine learning models.

- The H-stat is a non-parametric method that works well with ordinal variables, and it's useful when the interaction is not linear.
- The H-stat compares the average rank of the response variable for each level of the predictor variable, considering all possible pairs of levels.
- The H-stat calculates the sum of these rank differences and normalizes it by the total number of observations and the number of levels in the predictor variable.
- The lower the H-stat, the stronger the interaction effect.
- The article provides a step-by-step process for calculating the H-stat, using an example with a hypothetical dataset about the effects of asbestos exposure on lung cancer for smokers and non-smokers.
- The author also discusses the assumptions of the H-stat and its limitations, such as the need for balanced data and the inability to detect interactions between more than two variables.

2024-05-29 Tags: friedman_s h-statistic, h-stat, machine learning, interactions, data science, explainability, xai by klotz

Open-Source Models, Temperature Scaling, Re-Ranking, and More: Don’t Miss Our Recent LLM Must-Reads

The Towards Data Science team highlights recent articles on the rise of open-source LLMs, ethical considerations with chatbots, potential manipulation of LLM recommendations, and techniques for temperature scaling and re-ranking in generative AI.

SemanticScuttle - klotz.me

klotz: data science*

Linked Tags

Related Tags